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ABSTRACT 


This  report  describes  the  continuing  development  of  preprocessing, 
classification,  and  context  analysis  techniques  for  hand-printed  text, 
which  are  advancing  at  an  accelerating  pace. 

Experiments  have  been  continued  with  the  Piecewise-Linear  learning 
machine,  using  the  outputs  of  two  preprocessors:  the  PREP  24A  simu¬ 
lation  of  the  1024-image  optical  preprocessor,  and  the  CALMMASK  preproces¬ 
sor,  which  employs  both  edge-detecting  and  corner-detecting  masks.  A 
new  low  test  error  rate  for  classification  has  been  achieved  on  hand-printed 
alphabets  of  FORTRAN  characters. 

Statistics  of  the  performance  of  the  learning  machine  during  a  single 
testing  iteration  are  presented,  and  shed  light  on  several  important 
questions,  such  as  the  distribution  of  rankings  of  the  desired  character 
category  when  it  is  not  in  first  place. 

A  discussion  of  the  preprocessing  methods  used  in  the  topological 
approach  to  preprocessing  and  classification  is  begun. 

The  initial  development  of  a  FORTRAN  syntax  analyzer  is  described. 

A  milestone  has  been  reached  with  the  passage  of  a  small  sample  of  actual 
FORTRAN  text  from  a  coding  sheet  through  the  scanning,  preprocessing, 
classification,  and  syntax-analysis  programs. 
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I  INTRODUCTION 


The  development  of  preprocessing,  classification,  and  context- 
analysis  techniques  for  hand-printed  text  is  progressing  at  an  acceler¬ 
ating  pace. 

Section  II  of  this  report  describes  experiments  with  two  template¬ 
matching  preprocessors  and  the  CALM  simulation  of  the  Piecewise-Linear 
Learning  Machine.  One  series  continues  the  nine-view  experiments  with 
the  outputs  of  the  PREP  24A  simulation  of  the  1024-image  optical  preproces¬ 
sor.  A  new  low  test  error  rate  is  achieved  as  the  training  set  is  ex¬ 
panded.  In  a  new  series  of  experiments,  CALM  is  used  on  the  outputs  of 
a  different  simulated  preprocessor  (the  CALMMASK  program),  utilizing  both 
edge-detecting  and  corner-detecting  masks,  together  with  the  features  de¬ 
rived  from  Clemens'  technique. 

In  an  interesting  new  line  of  data  analysis,  detailed  statistics 
were  developed  for  the  performance  of  the  learning  machine  during  a  single 
testing  iteration.  These  are  discussed  in  Sec.  II.  These  statistics  shed 
light  on  several  questions  important  to  both  the  classifier  and  the  context 
analyzer— for  example,  the  distribution  of  rankings  of  the  desired  char¬ 
acter  category  when  it  is  not  in  first  place. 

A  discussion  of  the  preprocessing  methods  used  in  the  topological 
approach  to  preprocessing  and  classification  (formerly  called  the  AD 
HOC  approach)  is  begun  in  Sec.  III.  An  improved  classification  routine 
is  being  developed,  and  it  is  planned  that  extensive  further  discussion 
of  these  methods  will  be  presented  in  the  next  report. 

The  initial  development  of  a  FORTRAN  syntax  analyzer,  which  is  the 
heart  of  the  context  analyzer,  is  described  in  Sec.  IV.  A  milestone 
has  been  reached  with  the  passage  of  a  small  sample  of  actual  FORTRAN  text 
from  a  coding  sheet  through  the  scanning,  preprocessing,  classification, 
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and  syntax-analysis  programs.  The  results  of  this  experiment  (presented 
in  Sec.  IV)  indicate  the  power  of  the  syntax  analysis  in  cleaning  up 
text  with  misclassif ied  characters. 


II  EXPERIMENT  WITH  TWO  TEMPLATE- MATCHING  PREPROCESSORS 
AND  THE  PIECEWISE- LINEAR  LEARNING  MACHINE 


A .  Further  Experiments  with  the  Edge-Detecting  Preprocessor  and  the 
Piecewlse-Linear  Learning  Machine 

We  have  continued  the  series  of  experiments  described  in  the  Second 
and  Third  Quarterly  Reports  with  two  additional  experiments.  The  basic 
feature  vectors  used  in  these  experiments,  as  in  the  ones  described 
previously,  were  the  nine-view,  84-bit  binary  vectors  produced  by  the 
PREP  24A  simulation  of  the  1024- image  optical  preprocessor.  Each  of  the 
84  bits  specifies  the  detection  of  an  edge  of  a  certain  orientation  in 
a  certain  region  of  the  image  field.  The  classifier  used,  as  before, 
was  the  CALM  (Collected  Algorithms  for  Learning  Machines)  simulation 
of  the  46-category  Piecewise-Linear  Learning  Machine,  with  two  dot  pro¬ 
duct  units  per  category. 

1.  PREP-CALM  Experiment  8 

This  was  a  re-run  of  Experiment  3  (Described  in  Sec.  II  of  the 
Second  Quarterly  Report) ,  in  which  we  added  to  the  feature  vectors  the 
24  bits  generated  by  Clemens'  technique  (also  described  in  the  Second 
Quarterly  Report).  The  patterns  used  in  Experiment  3  were  9-view, 

84-bit  patterns.  In  Experiment  8,  each  of  the  single-view  patterns  was 
augmented  by  the  addition  of  24  bits,  broken  into  eight  segments  of  three 
bits  each.  Within  each  of  the  eight  segments,  the  three  bits  were  used 
to  encode  the  number  of  occurrences  of  extrema  of  the  figure  boundary  (in 
X  or  Y)  in  one  of  the  four  quadrants  of  the  figure. 

The  result  of  combining  the  edge-detection  data  with  the  Clemens' 
technique  data  was  thus  a  set  of  nine  feature  vectors  (views)  for  each 
character.  The  feature  vectors  each  had  108  (84  +•  24)  components.  The  24 
Clemens'  bits  were  the  same  throughout  all  nine  views,  whereas  the  edge- 
detection  bits  varied. 
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The  results  of  the  learning-machine  experiment  on  this  set  of  feature 
vectors  are  presented  in  Fig.  1.  The  training  error  rate  decreased  to 
23. 4$  in  five  iterations.  The  one-view  independent  test  error  rate 
dipped  to  33. 9$,  then  rose  to  39.9$  at  Iteration  4.  The  nine-view  test 
error  rate  was  calculated  at  Iteration  3  (22.6$)  and  at  Iteration  5 
(23.7$) . 

(The  graphs  of  Fig.  1  and  similar  figures  are  prepared  by  a  sepa¬ 
rate  small  program  for  the  SDS  910  computer,  called  ERROR  GRAPH.  The 
training  error  rate  is  the  generally  lower  curve,  whose  numerical  values 
are  listed  below  the  curve.  The  test  error  rate  is  the  other  curve.  The 
precision  of  plotting  the  ordinate  values  is  limited  to  half-line  spacing 
vertically  by  the  use  of  the  computer’s  typewriter  for  preparing  the  graph; 
each  vertical  half-space  corresponds  to  1.2  or  1.3$. ) 

The  results  of  Experiment  8  may  be  compared  with  those  of  Experiment  3, 
in  which  the  training  error  rate  reached  36$  in  5  iterations,  the  one-view 
test  error  rate  reached  45$,  and  the  nine-view  test  error  rate  was  23$ 

(Fig.  2),  We  see  that  the  addition  of  the  Clemens'  technique  bits  in  the 
present  experiment  has  considerably  improved  the  one-view  training  and 
test  error  rates  during  the  first  five  iterations,  but  has  had  essentially 
no  effect  on  the  important  nine-view  test  error  rate.  This  result 
would  appear  to  reflect  the  fact  that  the  new  bits,  while  contributing 
valuable  information  to  each  view,  do  not  contribute  correspondingly  to 
the  majori ty- vot ing  nine-view  recognition  process,  because  the  bits  are 
the  same  in  all  views.  The  improvement  in  recognition  rate  using  nine 
views  may  be  thought  of  as  resulting  from  the  outvoting  of  a  "bad"  view 
or  views  by  the  others,  and  this  cannot  happen  when  the  information  in 
all  views  is  the  same. 

In  conclusion,  we  found  that  the  extra  information  carried  in  the 
Clemens'  technique  bits  did  improve  the  single-view  classification  of 
patterns,  but  that  without  a  "9-view  generalization"  of  the  Clemens' 
technique  the  improvement  did  not  carry  over  noticeably  to  9-view  classi¬ 
fication  . 
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LEARNING  CURVES 
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FIG.  2  PREP-CALM  EXPERIMENT  3 
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2 .  PREP-CALM  Experiment  9 

Experiment  9  was  a  re-run  of  Experiment  3  with  an  expanded  training 
set.  The  patterns  used  were  the  same  as  those  in  Experiment  3:  84-bit, 

nine-view  patterns.  This  time,  in  addition  to  the  three  FORTRAN  alpha¬ 
bets  from  each  of  twelve  authors  for  training  and  four  authors  for  test¬ 
ing,  three  alphabets  from  each  of  eight  more  training  authors  had  been 
preprocessed  through  PREP  24A. 

The  results  for  Experiment  9  are  presented  in  Fig.  3.  The  training 
error  rate  decreased  to  31.4$  at  Iteration  10.  The  one-view  test  error 
rate  ranged  between  38  and  42$  from  Iteration  3  through  Iteration  10. 

The  nine-view  test  error  rate  was  18.8$  at  Iteration  5  and  19.6$  at  Iter¬ 
ation  10.  These  values  represent  a  new  low  in  test  error  rate  for  the 
FORTRAN  characters,  and,  apart  from  statistical  fluctuations,  appear  to 
be  a  couple  of  percent  lower  than  the  results  of  Experiment  3.  It  may  be 
noted  that  the  training  and  test  error  curves  are  quite  close  together, 
indicating  that  the  expanded  traininr  set  is  largely  successful  in  rep¬ 
resenting  the  test  data. 

B .  Experiments  with  the  CALMMASK  Preprocessor  and  the  Piecewise-Linear 
Learning  Machine 

1 .  The  CALMMASK  Preprocessing  Program 

As  an  aid  to  the  development  of  new  templates  (or  masks)  for  pre¬ 
processing,  and  new  structures  combining  these  templates,  a  program 
called  CALMMASK  was  written  for  the  SDS  910.  CALMMASK  implements  sim¬ 
ulated  optical  masks  of  the  type  used  in  the  1024-image  preprocessor 
and  the  PREP  25A  s imulat ion-- f or  example,  edge-detectors.  CALMMASK  allows 
additional  flexibility  in  the  use  of  these  templates.  The  shapes  and 
threshold  values  of  individual  template  types  may  be  specified;  several 
templates  of  the  same  or  different  types  may  then  be  combined  by  logical 
OR-ing  and  AND-ing  into  a  feature;  and  features  may  be  replicated  at 
various  locations  on  the  pattern  image  field.  Parameters  controlling 
all  of  these  options  are  under  direct  control  from  the  computer  console. 
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LEARNING  CURVES 
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CALMMASK  exists  in  two  versions.  The  "interactive"  version  allows 
an  experiment  to  design  features  on-line  by  specifying  them  at  the  console, 
observing  their  behavior  when  presented  with  test  patterns,  and  modifying 
them  at  will  The  "production"  version  provides  a  more  efficient  program 
for  processing  large  quantities  of  patterns  through  an  already-designed 
preprocessor . 

The  CALMMASK  feature  set  used  in  the  experiments  to  be  described 
here  was  as  follows:  There  were  16  types  of  template.  Twelve  of  these 
were  edge  detectors  similar  to  those  employed  in  PREP  24A,  oriented  at 
each  30°  of  the  compass.  The  remaining  four  were  corner  detectors, 
designed  to  detect  the  corners  formed  by  the  meeting  of  a  vertical  and 
a  horizontal  stroke  (Fig.  4).  Each  corner  template  had  14  cells  with  a 


<o)  NW  CORNER 


<b)  NE  CORNER 


(c)  SW  CORNER 


(a)  SE  CORNER 

T4  -  5144-2 


FIG.  4  CORNER-DETECTING  TEMPLATES 
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weight  of  +1,  6  cells  with  a  weight  of  -1,  and  a  threshold  of  12.  It 
may  be  noted  that  the  templates  are  more  tolerant  of  the  orientation  of 
the  vertical  stroke  than  of  the  horizontal,  reflecting  the  characteristics 
of  actual  printing. 

Each  of  the  16  template  types  was  placed  in  each  of  the  four  quad¬ 
rants  of  the  image  field,  giving  a  total  of  64  pattern  components  (fea¬ 
tures)  in  the  output  of  CALMMASK.  Within  each  quadrant  the  template  was 
presented  in  every  vertical  and  hoxizontal  location,  and  a  response  from 
the  template  in  any  location  caused  a  positive  response  for  the  correspond¬ 
ing  feature.  (in  other  words,  each  feature  was  a  many-way  OH  function 
of  all  the  responses  for  the  locations  throughout  the  quadrant.) 

The  patterns  were  not  translated  before  presentation  to  the  tem¬ 
plates,  as  was  the  case  with  PREP  24A;  thus,  only  one-view  feature  vectors 
were  obtained  from  CA1MMASK .  It  was  expected  that  the  presentation  of 
the  templates  in  every  location  would  have  much  the  same  effect  as  the 
translation  of  the  patterns  to  give  the  nine- view  PREP  24A  feature  sets. 

One  purpose  of  the  experiments  was  to  compare  the  two  approaches  to  trans¬ 
lation  Invariance;  the  other  purpose  was  to  see  the  effect  of  the  corner- 
detecting  templates. 


2 .  MASK-CALM  Experiment  2 

Following  a  shakedown  experiment,  a  full  set  of  patterns  was  pre- 
processed  with  CALMMASK  and  presented  to  the  CALM  simulation  of  the 


Piecewise-Linear  learning  machine.  The  24  Clemens’  technique  bits  de¬ 


scribed  above  were  added  to  the  patterns  as  they  were  presented  to  CALM, 
forming  feature  vectors  of  88  (64  +  24)  bits.  In  this  MASK-CALM  Experi¬ 
ment  2  the  training  and  testing  set j  were  the  same  as  in  Experiment  3 


of  the  previous  series,  which  used  patterns  preprocessed  by  PREP  24A. 
Thus,  a  direct  comparison  Is  possible.  The  training  set  consisted  of 
three  FORTRAN  alphabets  from  each  of  twelve  authors;  and  the  test  set, 


of  three  alphabets  from  each  of  four  authors. 
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Figure  5  shows  the  results  of  the  experiment  The  training  error 
rate  decrea'^d  to  2  9$  In  five  iterations.  The  training  error  rate  dui 
ing  PREP-CALA1  Experiment  3  never  improved  beyond  30$  (however,  it  must 
be  remembered  that  only  one-view  patterns  were  used  in  the  present  in¬ 
stance,  so  the  Identical  feature  vector  was  presented  at  each  iteration, 
forming  an  easier  training  problem).  The  test  error  rate  decreased  to 
25.2$,  then  rose  to  27.0$  at  Iteration  5.  These  rates  may  be  compared 
with  the  23$  test  error  rate  of  PREP-CALM  Experiment  3  at  Iteration  4. 

The  single-view  pattern  vectors  from  CALMMASK  performed  almost  as  well 
as  the  nine-view  vectors  from  PREP  24A. 

3.  MASK-CALM  Experiment  3 

In  MASK-CALM  Experiment  3,  six  more  authors  (18  alphabets)  were 
added  to  the  training  set.  Other  details  stayed  the  same  as  in  the 
previous  experiment.  As  shown  in  Fig.  6,  the  training  error  rate 
decreased  to  8.1$  in  four  iterations,  and  the  test  error  rate  reached 
24.6$. 

4 .  MASK-CALM  Experiment  4 

In  Experiment  4,  the  first  six  authors  ( 18  alphabets)  in  the  pattern 
file  were  used  for  testing,  and  the  seventh  through  twenty-second  authors 
for  training.  Again,  all  other  details  of  the  experiment  were  the  same 
as  in  the  two  previous  experiments.  The  experiment  was  carried  for  ten 
iterations  to  check  for  any  extra  long-term  improvement  in  the  test 
error  rate.  Figure  7  shows  that  the  test  error  rate  flattened  out  at 
22  to  23$  after  the  third  iteration.  The  training  rate  reached  3.5$. 

Comparison  of  Experiments  3  and  4  with  Experiment  2  show  that  the 
increased  training  set  has  improved  the  test  error  rate  on  the  CALMMASK 
patterns  by  a  small  amount.  To  date,  the  best  error  rate  in  the  experi¬ 
ments  on  the  CALMMASK  patterns  (22$  in  Experiment  4)  has  not  matched  the 
best  9-view  rate  ( 19$  in  Experiment  PREP-CALM  9). 

5 .  MASK-CALM  Experiment  5 

Experiment  5  was  performed  to  isolate  the  effect  of  the  Clemens' 
technique  bits,  which  had  been  included  with  the  template  features 
throughout  the  other  MASK-CALM  experiments.  In  Experiment  5,  only  the 
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LEARN  I  NS  CURVES 


SRI  PROJECT  5«s  EXPT  NO.  MASK-CALM  2 

RUN  ON  88-61 T  PATTERNS  FROM  CALMMASK  ♦  CLEMENS.  12  TRAINING  AUTHORS,  %  TEST 
THERE  ARE  1  «9«  TRAINING  PATTERNS  ANO  992  TESTING  PATTERNS 

V  ERROI  RATE  SUCCESS  RATE 


TRAIN  123*96  78*  II 

TEST  123*96789  1* 

ITCRATlWS 


FIG.  5  MASK-CALM  EXPERIMENT  2 


12 


LEARNING  CURVES 


SRI  PROJECT  386A,  EXPT  NO.  MASK -CALM  3 

RUN  ON  88*81  T  PATTERNS  FR(N  CALMNASK  ♦  CLEMENS.  18  TRAINING  AUTHORS  AND  «  TEST. 
THERE  ARE  298*  TRAINING  PATTERNS  AN)  992  TESTING  PATTERNS 

V  ERRCR  RATE  SUCCESS  RATE 


TRAIN  12  3*94  789  18 

KST  1  2  3  %  9  8  T  8  9  TB 

ITE RATI  IMS 


FIG.  6  MASK-CALM  EXPERIMENT  3 
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LEARNING  CURVES 


SRI  PROJECT  5864,  E  APT  NO.  MASK -CALM  A 

RUN  ON  88-81  T  PATTERNS  FR CM  CKLMMASK  ♦  CLEMENS.  TRAIN  ON  AUTHORS  7-22, TEST  1-6. 
THERE  ARC  2288  TRAINING  PATTERNS  AND  828  TESTING  PATTERNS 

V  ERR®  RATE  SUCCESS  RATE 


TRAIN  1  2  3  A  8  6  7  8  9  If 

TEST  123A  56789  If 

ITERATIONS 


FIG.  7  MASK -CALM  EXPERIMENT  4 
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64  template  feature  bits  were  used.  The  training  and  testing  sets  were 
the  same  as  in  Experiment  4.  Figure  8  shows  that  the  error  rates  were 
increased  by  the  deletion  of  Clemens'  technique  bits:  in  four  iterations, 
the  training  error  rate  reached  14.5$  and  the  test  error  rate  reached 
32.2$. 

C .  Examination  of  Learning-Machine  Statistics  During  a  Test  Iteration 

A  small  modification  was  made  to  the  CALM  program,  which  allowed 
certain  statistics  concerning  the  performance  of  the  Piecewise-Linear 
learning  machine  to  be  gathered  during  the  running  of  an  iteration. 

Raw  statistics  gathered  included  the  values  of  the  largest  and  second 
largest  category  responses  (Dot  Product  Unit  sums)  for  each  pattern,  the 
ranking  of  the  desired  (true)  category,  and  its  sufficiency  or  deficiency . 

The  ranking  of  the  desired  category  can  range  from  1  to  46.  It  is  1 
for  a  pattern  if  and  only  if  the  pattern  is  correctly  classified.  If  the 

ranking  is  1,  the  sufficiency  is  defined  as  the  difference  between  the 

DPU  sum  of  the  desired  category  and  the  largest  of  the  other  sums  (which 
will  belong  to  the  second-ranked  category).  If  the  pattern  is  in  error, 
the  deficiency  is  defined  as  the  difference  between  the  largest  sum  (the 

one  in  the  chosen  category)  and  the  sum  for  the  desired  category. 

The  sufficiency  (or  deficiency)  measures  the  closeness  of  the  machine's 
decision,  and  thus  can  be  interpreted  as  a  measure  of  confidence  in  the 
category  chosen.  (if  the  ranking  of  the  desired  category  is  3  or  greater, 
the  deficiency  does  not  show  how  close  the  second  choice  was  to  the 
first  choice,  but  this  is  a  small  point.) 

Since  the  significance  of  individual  DPU  sums  is  clearer  in  a  one- 
view  than  in  a  nine- view  experiment,  we  chose  a  one-view  test  iteration 
for  analysis:  Test  Iteration  4,  from  MASK-CALM  Experiment  4.  Figure  7 
shows  the  error  rate  for  this  iteration  to  be  23.1$, 

The  distribution  of  rankings  of  the  desired  category  is  shown  in 
Table  I.  Rankings  1  and  2  include  the  correct  category  almost  90$  of  the 
time;  rankings  1  through  4  include  the  correct  category  95$  of  the  time. 
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LEARNING  CURVES 


SRI  PROJECT  586*,  E  APT  NO.  MASK-CALM  5 

RUN  ON  6A-BIT  PATTERNS  FRCM  CALMMASK.  TRAIN  ON  AUTHORS  7-22,  TEST  1-6. 
THERE  ARE  22(8  TRAINING  PATTERNS  AW)  828  TESTING  PATTERNS 


FIG.  8  MASK-CALM  EXPERIMENT  5 
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Table  I 


MASK-CALM  EXPERIMENT  4,  TEST  ITERATION  4  — 
RANKINGS  OF  DESIRED  CATEGORY 


Ranking 

Occurrence 

* 

Cumulative  ^ 

1 

637 

76.9 

76.9 

2 

96 

11.6 

88.5 

3 

40 

4 . 8 

93.3 

4 

14 

1 . 7 

95.0 

5 

9 

1  .  1 

96.  1 

6 

7 

0.8 

96.9 

7 

2 

0.2 

97.1 

8 

3 

0.4 

97.5 

9 

1 

0.  1 

97.6 

10 

5 

0.6 

98.2 

12 

2 

0.3 

98.5 

13 

3 

0.4 

00 

14 

1 

0.  1 

99.0 

17 

2 

0.3 

99.3 

19 

5 

0.  6 

99.9 

27 

1 

0.  1 

100.0 

TOTAL 

828 

100.0 

100.  0 
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Thus,  presenting  the  first  lew  choices  to  the  context  analyzer  leads  to 
a  very  high  probability  of  including  the  correct  category. 


FIG.  9  HISTOGRAM  OF  SUFFICIENCIES 

Figure  9  is  a  histogram  of  the  sufficiencies  of  the  correctly  clas¬ 
sified  patterns.  Figure  10  is  a  histogram  of  the  deficiencies  of  the 
Incorrectly  classified  patterns,  broken  down  according  to  ranking. 

Figure  11  is  a  histogram  of  values  of  the  maximum  DPU  sum  formed  for 
every  pattern,  broken  into  two  parts:  for  the  correctly  classified 
patterns,  and  for  the  patterns  in  error.  A  number  of  interesting  con¬ 
clusions  can  be  drawn  from  these  graphs. 

The  first,  and  quite  surprising  fact  to  be  observed  from  the  histo¬ 
grams  is  the  great  range  of  the  maximum  DPU  sums  (from  approximately  280 
to  1310),  sufficiencies  (up  to  800),  and  deficiencies  (up  to  600).  The 
CA1AI  program  records  and  prints  out  the  overall  maximum  DPU  sum  formed 
during  an  entire  iteration,  as  a  check  against  overflow  in  the  computer. 
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*8  5 


FIG.  10  HISTOGRAMS  OF  DEFICIENCIES  vs.  RANKING 
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INCORRECTLY 

CLASSIFIED 

PATTERNS 


MAXIMUM  DPU  SUM 


FIG.  11  HISTOGRAMS  OF  MAXIMUM  DPU  SUMS 


In  this  case,  the  largest  sum  was  1312,  and  this  figure  is  in  the  typi¬ 
cal  range  for  experiments  run  with  CALM.  We  may  assume  that  the  most 
negative  DPU  sum  formed  during  the  iteration  was  comparable  in  magnitude. 
This  moans  that  all  of  the  DPU  sums  formed  for  each  pattern  lie  in  an 
interval  of  length  approximately  2500. 

If  the  46  category  responses  in  the  machine  were  randomly  distributed 
in  the  range  -1200  to  1300,  the  average  numerical  interval  between 
responses  would  be  about  50.  Even  with  fluctuations,  we  would  expect  the 
sufficiencies,  most  of  the  deficiencies,  and  the  variation  in  maximum 
sums  all  to  range  up  to  only  100  or  200.  Yet  we  find  spreads  of  600 
to  1000,  and  occurrences  such  as  a  pattern  for  which  not  one  of  the 
DPU  sums  exceeded  (approximately)  280.  Such  behavior  is  quite  contrary 
to  our  intuition,  which  expected  much  tighter  distributions  of  these 
quantities.  Since  the  actual  performance  of  the  learning  machine  is  an 
£  Pr tor i  fact,  we  do  not  infer  that  the  observed  distributions  are  in 
themselves  "good”  or  "bad" — merely  surprising. 

A  second  observation  is  that  the  distribution  of  maximum  DPU  sums 
is  higher,  on  the  average,  when  the  pattern  is  correctly  classified  than 
when  it  is  not.  It  might  be  possible  to  use  the  maximum  sum  to  adjust 
the  confidence  measures  of  the  chosen  and  competing  categories,  unless 
the  maximum  sum  is  so  correlated  with  the  sufficiency  and  deficiency  that 
there  is  little  or  no  independent  information  to  be  gained. 

Turning  to  the  histogram  of  sufficiencies,  we  find  a  tendency, 
which  appears  to  be  statistically  significant,  for  depletion  in  the 
region  near  zero.  Since  the  sufficiency  and  deficiency  are  measures 
of  the  same  quantity  (namely,  desired-category  response  minus  the  maximum 
of  other  responses),  we  can  further  study  this  effect  by  combining  the 
deficiency  histograms  of  Fig.  10,  reversing  the  horizontal  axis,  and  placing 
the  resulting  histogram  beside  that  of  Fig.  9.  This  is  done  in  Fig.  12. 

It  is  evident  that  the  dropoff  in  sufficiencies  near  zero  is  related  to 
the  continuing  dropoff  of  occurrences  with  increasing  deficiency.  (Since 
there  is  no  reason  to  expect  a  discontinuity  at  zero,  the  jump  there  is 
probably  a  statistical  fluctuation.) 
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IB  SB** -  € 


FIG.  12  COMBINED  HISTOGRAMS  OF  DEFICIENCIES  AND  SUFFICIENCIES 

During  MASK-CALM  Experiment  4,  the  training  margin  was  set  to  SB 
(the  number  of  pattern  components).  This  value  closely  matches  the  value 
of  sufficiency  at  which  the  dropoff  occurs.  It  is  an  attractive  hypoth¬ 
esis  that  the  margin  has  tended  to  "push"  sufficiencies  above  the  88 
level,  although  this  one  example  is  only  limited  evidence.  If  the  hypoth¬ 
esis  is  true,  the  results  are  quite  satisfying,  because  although  the 
training  governed  by  the  training  margin  was  applied  only  to  the  training 
patterns,  we  here  see  its  effect  in  enhancing  the  decisions  on  the  test 
patterns . 

Finally,  the  study  of  information  such  as  that  in  Fig.  10  will  be 
of  value  in  the  future,  when  more  is  known  about  the  needs  of  the  context 
analyzer.  Figure  10  portrays  the  relation  between  the  ranking  of  a  mis- 
classified  character  and  its  deficiency.  A  study  of  relations  such  as  this 
will  indicate,  for  example,  how  much  weight  should  be  given  to  the  ranking, 
and  how  much  to  the  differences  in  DPU  sums,  when  determining  the  confi¬ 
dence  measures  to  be  assigned  to  each  category. 
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Ill  TOPOLOGICAL  PREPROCESSING  OPERATIONS  FOR  HANDPRINTED 
CHARACTER  RECOGNITION 


A .  Inlroduct ion 

Let  us  propose,  with  the  usual  risk  of  oversimplification,  the 
following  difference  among  methods  of  extracting  feature  information  for 
the  recognition  of  graphical  patterns  such  as  handprinted  characters. 

On  one  hand  are  the  "topological'1  preprocessing  methods;  on  the  other 
hand,  the  non- topological "  ones. 

The  topological  methods  extract  from  the  character  image  those  types 
of  features  that  would  be  commonly  used  by  people  asked  to  describe  the 
shapes  of  characters.  Typical  descriptions  are:  A  letter  P  has  a  closed 

loop  on  top,  with  a  stroke  sticking  down  from  it--on  the  left-hand  side. 
"The  difference  between  a  letter  0  and  a  letter  D  is  that  the  O  is  round, 
but  the  D  has  two  corners  on  the  left,"  and  so  on.  Topological  features 
include  strokes,  loops,  hollows,  corners,  curvatures,  connections,  etc., 
as  well  as  the  relative  positions  and  orientations  of  the  basic  features. 
In  short,  these  features  are  primarily  concerned  with  the  geometrical  and 
topological  components  and  relationships  of  the  character  as  a  whole. 

We  may  characterize  the  non- topological  methods,  by  contrast,  as 
those  which  derive  information  less  related  to  the  "natural"  or  intuitive 
description  of  the  character  at  the  topological  level.  Clemens’  technique 
(described  in  Ref.  1*  and  in  the  Second  Quarterly  Report),  in  which  the 
x  and  v  extrema  of  the  contour  of  the  character  are  recorded,  is  an 
example  of  such  a  method.  Integral  geometry  (Quarterly  Reports  3  and  4 
of  the  preceding  Contract  No.  DA  36-039  SC-78343),  in  which  statistical 
measurements  are  made  of  the  intersections  of  a  pattern  with  randomly 
chosen  lines,  is  a  prime  example  of  a  method  seemingly  unrelated  to  the 
natural  description  of  the  character.  The  character-recognition  litera¬ 
ture  provides  many  more  examples  of  non-topological  feature-extraction 


References  are  listed  at  the  end  of  the  report. 
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techniques,  such  as  random  sampling  (e.g.,  Perceptron  and  N-tuple) 
methods,2  and  the  sequence  of  Intersections  of  the  character  with  a 
scan  line  of  fixed  orientation.*3  Finally,  in  this  framework,  the  ex¬ 
traction  of  features  by  edge-detecting  masks  (as  exemplified  by  the 
1024-image  preprocessor)  falls  in  the  non-topological  category. 

Non-topological  preprocessing  methods  are  often  prompted  by  their 
elegance  and  simplicity,  and  the  convenience  of  a  uniform  approach.  Most 
such  techniques  are  based  on  elegant  or  "clever"  processes  that  are  quite 
simple  conceptually,  and  that  are  correspondingly  easy  and  straightforward 
to  implement  in  a  computer  program  or  in  hardware.  If  a  process  generates 
sufficient  information  to  allow  unique  classification  of  well-formed 
characters,  it  becomes  a  candidate  for  a  preprocessing  technique.  The 
major  problem  that  confronts  such  methods  arises  when  they  are  faced  with 
the  ill-formed  characters  that  do  occur  in  actual  input  and  must  be  handled. 
The  method  based  on  a  single  organizing  principle  often  seems  to  lack  the 
"ruggedness"  to  maintain  the  constancy  of  its  outputs  in  the  face  of 
character  distortions  and  aberrations,  and  no  corrective  recourse  is 
available  within  the  framework  of  the  single  uniform  approach. 

The  topological  preprocessing  methods  gain  their  appeal  from  the 
fact  that  they  use  the  same  features  used  by  humans  in  describing  the 
characters.  It  can  then  be  hoped  that  when  faced  with  distortions  that 
leave  a  character  still  recognizable  by  humans .  such  methods  will  preserve 
information  sufficient  for  classification.  As  a  corollary,  human  intro¬ 
spection  together  with  observation  of  the  system's  operation  can  be 
used  as  guides  for  designing,  evaluating,  and  improving  the  preprocessor 
and  subsequent  classifier. 

B .  Current  Status  of  the  Topological  Preprocessing  and  Classification 
Program 

A  preliminary  program  for  the  preprocessing  of  handprinted  characters 
by  the  extraction  of  topological  features  was  described  in  the  third 


The  "Birdwatch”  technique,  developed  by  Rabinow  Engineering  Co., 
described  in  Ref.  1. 
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Quarterly  Report  under  the  heading,  "AD  HOC  Preprocessing  and  Classifi¬ 
cation  of  Characters.”  The  preliminary  program  contained  routines  for 
finding  the  connected  components  of  a  character  image,  its  boundary,  con¬ 
vex  hull,  enclosures,  concavities  of  the  boundary,  and  spurs  (strokes  that 
end  at  an  isolated  tip).  The  preliminary  program  consisted  almost  entirely 
of  these  preprocessing  routines.  Only  a  fragmentary  classification 
routine,  with  a  decision  tree  for  handling  single-stroke  characters,  had 
been  added. 

An  extensively  modified  version  of  this  program,  called  TOPO  2,  is 
currently  being  written.  Changes  of  three  types  are  being  made  in  TOPO  2. 

1 irst ,  needed  improvements  have  been  introduced  into  the  boundary¬ 
following  and  stroke-tracing  routines.  Second,  a  general  cleanup  of  the 
coding  was  undertaken,  primarily  to  reduce  running  time  and  storage 
requirements.  Third,  the  decision  tree  approach  to  classification  that 
had  been  begun  in  the  AD  HOC  program  has  been  dropped  in  favor  of  pro¬ 
ducing  alternative  classifications  with  confidence  measures. 

The  change  in  classification  procedure  is  important  in  two  respects. 

On  one  hand,  output  providing  alternative  classifications  and  their 
confidence  measures  is  vital  for  the  operation  of  the  syntax  and  context 
analyzer,  discussed  in  Sec.  IV  of  this  report.  But  in  addition,  it 
appears  that  the  new  procedure  will  be  much  easier  to  design  and  modify. 

In  the  decision-tree  approach  there  was  a  considerable  tendency  for  all 
but  the  most  conservative  decisions  to  send  characters  down  the  wrong 
branches  of  the  tree.  For  example,  a  seemingly  obvious  dichotomy  is  one 
between  characters  with  enclosures  and  those  without.  But  many  characters 
have  spurious  enclosures  due  to  quantization  noise,  and  many  actual  en¬ 
closures  are  filled  in.  It  is  impossible  to  make  even  such  basic  dichot¬ 
omies  without  losing  a  considerable  number  of  characters  from  their  pro¬ 
per  branches .  If  the  alternate  branches  of  the  decision  tree  are  patched 
up  to  handle  the  characters  that  fall  into  them,  the  program  becomes  un¬ 
manageably  complex. 

In  the  confidence-measure  approach,  however,  the  decision  is  made 
separately  for  each  character  category  on  the  basis  of  all  the  preprocessed 
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Information.  Each  case  can  be  decided  on  the  basis  of  its  own  merits,  so 
to  speak.  There  is  not  the  pressure  to  make  binary  choices  like  the 
branching  of  a  decision  tree,  if  there  is  any  significant  possibility  of 
losing  characters  thereby.  Furthermore,  absolute  decisions  do  not  have 
to  be  made  in  any  case.  The  existence  of  the  continuous-valued  confi¬ 
dence  measure  allows  a  gradual  decrease  of  confidence  in  a  given  category 
as  the  feature  values  depart  more  and  more  from  the  values  expected  for 
that  category.  Thus,  the  natural  and  beneficial  consequence  of  producing 
confidence  measures  for  the  context  analyzer  is  that  the  classifier  is 
allowed  to  express  degrees  of  doubt,  as  it  were,  about  placing  a  character 
in  a  given  category.  This  situation  would  seem  to  mirror  the  human  re¬ 
sponse  to  ill- formed  text. 

The  addition  to  TOPO  2  of  a  classification  routine  embodying  these 
concepts  is  underway.  The  results  of  the  first  preliminary  tests  are 
most  encouraging.  Wo  shall  continue  to  implement  the  classification 
routine  (and  add  to  the  preprocessing  as  necessary)  and  report  more 
f'llly  on  results  in  the  next  report. 

The  remainder  of  this  section  contains  the  first  half  of  a  discussion 
of  the  techniques  that  have  been  developed  for  topological  preprocessing. 

C .  Discussion  of  Topological  Preprocessing  Operations 

Topological  features  extracted  by  preprocessing  should  not  only  be 
"natural,"  but  should  also  meet  the  allied  criterion  of  "ruggedness." 

A  rugged  feature  is  one  whose  presence  is  not  changed,  and  whose  charac¬ 
teristics  are  not  greatly  altered,  by  normal  variations  in  the  image  of 
a  character  in  a  given  category.  The  processing  routines  used  to  find  the 
features  must  be  tolerant  of  variations  in  the  source  characters  and 
distortions  caused  by  the  scanning  process,  if  they  are  to  produce  rugged 
features.  If  the  image  is  affected  by  salt-and-pepper  noise,  for  example, 
a  route  to  find  connected  figure  components  must  be  able  to  reject  small, 
Isolated  figure  elements. 

The  primary  feature  information  concerning  a  character  evidently  resides 
in  the  strokes  forming  the  character.  In  fact,  if  the  strokes  are  defined 
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as  comprising  the  path(s)  that  the  writing  instrument  follows  in  forming 
the  character,  it  is  a  tautology  that  the  strokes  contain  all  the  feature 
information.  But,  in  a  more  practical  sense,  the  stroke  information  suf¬ 
fers  two  weaknesses:  not  all  stroke  information  can  be  recovered,  and 
other  types  of  features  may  convey  equivalent  information  in  more  desirable 
form  . 

We  may  contrast  the  available  stroke  information  on  the  hand-printed 
page  with  that  of  on-line  '  input  to  a  computer,  through,  for  example,  a 
cathode  ray  tube  and  light-pen  or  a  RAND  tablet.  Two  characteristics  of 
on-line  input  are  outstanding.  First,  time-sequence  information  and 
even  velocity  information  about  the  strokes  are  available.  Second,  the 
strokes  are  line  drawings;  they  have  infinitesimal  width.  These  charac¬ 
teristics  make  the  recognition  of  characters  on-line  an  ent irely  different 
problem  from  the  off-line  recognition  of  characters  on  a  printed  page. 

It  is  a  point  of  major  significance  that  an  off-line  printed  character 
must  be  recognized  from  its  shape  alone . 

Full  stroke  Information  cannot  be  recovered  from  an  off-line  printed 
character  image,  owing  to  the  overlapping  of  strokes  in  the  body  of  the 
figure  and  the  masking  of  the  stroke  path  by  the  finite  width  of  the 
stroke.  (Thus,  it  appears  that  an  important  quantitative  parameter  of  the 
difficulty  of  a  handprinted  character  recognition  problem  is  the  ratio 
of  stroke  width  to  character  size.)  We  are  led,  therefore,  to  define  the 
stroke  information  as  that  information  that  can  be  derived  from  the  image 
by  some  processing  routine,  and  to  look  for  auxiliary  forms  of  natural, 
rugged  feature  information. 

Two  such  feature  types  are  concavities  of  the  figure  boundary,  and 
enclosures  (holes)  within  the  figure.  Others  are  junctions,  or  blobs-- 
regions  in  which  strokes  come  together  to  form  nodes,  masses,  or  areas  of 
confusion.  The  overall  size  and  location  of  the  character  as  a  whole  and 
of  its  connected  components  are  important  features.  We  may  also  add  to 
the  list  features  that  are  derivable  from  the  stroke  information:  direc¬ 
tions,  curvatures,  and  corners.  Finally,  the  relations  among  features 
can  be  features  in  themselves,  such  as  the  connections  of  strokes  and 
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the  relative  placement  of  strokes,  concavities,  and  enclosures.  It 
appears  that  the  features  just  listed  represent  a  natural,  and  profitable, 
way  of  presenting  handprinted  characters. 

We  turn  now  to  a  discussion  of  the  feature  types  and  the  computer 
routines  we  have  used  to  calculate  them  (working  from  a  24  X  24  binary 
matrix  representation  of  the  character  images). 

1 .  Figure  Extent  and  location 

The  subroutine  EXTENT  (NFIG,  JT,  JB ,  KL,  KR)  finds  the  indices  of 
the  topmost  (JT),  bottommost  (JB)  ,  leftmost  (KL) ,  and  rightmost  (KR) 
figure  points  in  the  image  NFIG.  (Rows  are  numbered  1-24  going  downward; 
columns  1-24  from  left  to  right.)  NFIG  may  be  a  character,  a  connected 
component,  or  any  image  at  all.  EXTENT  is  fast  in  operation  because  It 
need  merely  scan  the  Image  once  by  rows  (computer  words)  to  find  the  top 
and  bottom  of  the  figure,  then  scan  the  word  it  has  collected  meanwhile 
(by  OR-lng  the  rows  of  the  image)  to  find  the  left  and  right  boundaries. 

The  location  of  a  figure  is  determined  by  the  row  and  column  indices 
of  its  center.  The  center  of  an  object  is  typically  defined  as  its 
centroid  or  center  of  gravity.  Finding  the  centroid,  however,  requires 
a  lengthy  computation.  We  prefer  the  definition 

JC  =  (JT  t  JBl/2 
KC  =  (KL  r  KR)/2 

which  locates  the  center  of  the  smallest  rectangle  enclosing  the  figure. 
This  calculation  can  be  performed  far  faster  than  finding  the  centroid. 

It  generally  gives  values  close  to  the  centroid,  and  may  be  equally 
desirable  or  even  preferable  for  our  purpose. 

2 .  Connect ivlty 

A  figure  is  connected  if  any  two  of  its  elements  can  be  Joined  by 
a  chain  of  neighboring  figure  elements.  Two  definitions  of  "neighbor,” 
and  thus  of  connectivity,  are  at  hand.  We  shall  call  them  4-connectivity 
and  8-connec 1 1 v i ty  .  In  4-connectivity,  the  neighbors  (N)  of  an  element  (x) 
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are  the  four  adjacent  elements  vertically  and  horizontally: 


N 

NXN 

N 

In  8-connect Ivity ,  the  four  elements  adjacent  diagonally  are  included 
as  neighbors: 


NNN 

NXN 

NNN 

Rosenfeld  and  Pfaltz  describe  the  two  types  of  connectivity  in  a 
recent  article.4  They  point  out  the  "paradox,"  or  inelegance,  in  the 
connectivity  of  figure  (l)  and  ground  fo)  elements  related  thusly: 

1  O 
0  1 

The  figure  and  ground  are  both  8-connected,  but  neither  is  4-connected. 

The  authors  fail  to  remark  on  the  satisfjing  duality  that  results  from 
specifying  one  entity  to  be  governed  by  4-connectivity  and  the  other 
by  8-connectivity,  so  that  only  one  is  connected  at  a  crossover. 

It  is  generally  in  our  interest  to  maximize  figure  connectivity, 
so  we  choose  the  figure  to  be  governed  by  8-connectivity,  and  the  ground 
by  4-connectivity.  (Often  a  single  marginal  figure  element  will  lie 
diagonally  adjacent  to  the  body  of  the  figure,  and  we  can  thus  avoid 
having  to  treat  it  as  a  separate  figure.  Marginal  elements  and  isolated 
elements  due  to  sal t-and-pepper  noise  can  be  eliminated  by  a  smoothing 
operation.5  Since  we  seldom  receive  such  noise  from  the  vidicon  camera, 
we  avoid  the  smoothing  operation,  which  represents  extra  work  and  is  not 
without  some  danger  of  losing  significant  detail.) 

Our  choice  of  figure  and  ground  connectivities  means  that  concavities 
and  enclosures,  which  are  ground  areas,  will  be  4-connected.  Thus,  in 
the  following  image,  the  figure  is  connected  but  the  ground  is  not. 
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One  ground  element  is  an  enclosure  in  the  figure. 

110  0 
1  O  1  0 
110  0 

3 .  Subroutines  CONNS  and  CONN4 

Subroutines  CONNS  and  CONN4  embody  the  basic  connectivity  operation. 
CONNS  works  with  8-connect Iv i ty ;  CONN4 ,  with  4-connectivity.  Their 
action  Is  otherwise  identical,  so  only  CONNS  will  be  described.  The 
function  of  CONNS  is  to  find  those  connected  components  of  a  figure  (F1GA) 
that  include  elements  of  another  figure  (FIGB).  The  image  composed  of  the 
components  found  is  returned  by  the  subroutine  as  (FIGC).  CONN8  has  two 
modes  of  operation: 

CONNS  ( FIGA ,  FIGC,  FIGB,  0) 
and 

CONNS  (FIGA,  FIGC,  J,  K) 

where  J  and  K  range  from  1  to  24 .  In  the  second  mode,  the  figure  in  FIGB 
is  taken  to  consist  of  a  single  element  located  at  (J,  K). 

The  operation  of  CONNS  begins  with  storing  in  FIGC  an  image  that  is 
the  element-wise  logical  product  of  FIGA  and  FIGB: 

FIGC  (J,K)  =  FIGA  (j,K)  AND  FIBG(j,K) 

This  image  contains  all  the  1-bits  (usually  figure  elements)  common  to 
FIGA  and  FIGB.  In  the  second  step,  an  image  is  formed  which  includes  as 
1-bits  all  the  elements  of  FIGC  and  all  the  neighbors  of  all  i  te  elements 
of  FIGC.  This  image  represents  the  growth  of  FIGC  by  one  unit  over  all 
its  perimeter.  In  the  third  step  the  logical  product  of  this  "growing" 
image  with  FIGA  is  returned  to  FIGC,  thus  restraining  the  growth  of  FIGC 
to  elements  within  1IGA.  The  second  and  third  steps  are  repeated  until 
no  new  elements  are  generated  in  FIGC.  At  this  point,  FICG  has  filled 
out  the  connected  components  of  FIGA  containing  elements  of  the  original 
FIGC. 
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The  elementary  operations  required  by  CONNS  are  the  element-wise 
logical  AND  and  logical  OR  functions  performed  over  two  24  X  24  image 
fields,  and  operations  that  shift  an  image  field  right,  left,  up,  and 
down.  Such  operations  are  available  to  us  separately  in  the  form  of 
subrout ines : 


ANDFIG  ' 
ORFIG 
XORFIG 
DIFFIG 


(INFIG  1,  INFIG  2,  OUTFIG) 


(i  n  2) 
(1  u  2) 
(1  Q>  2) 
(1  n  -2) 


and 


RSHFIG 

LSHFIG 

USHFIG 

DFHFIG 


(INFIG,  COUNT,  OUTFIG) 


For  the  sake  of  speed  and  compactness  in  CONN8,  however,  these  operations 
are  performed  directly  by  machine- 1 anguage  coding. 

The  operations  just  listed  are  examples  of  parallel  operations, 
which  can  be  applied  in  parallel  to  the  elements  of  an  image  field  (or 
two)  to  produce  an  output  image  field.  The  attractiveness  of  parallel 
operations  in  terms  of  speed  is  such  that  entire  computers — notably  the 
ILLIAC  III  at  the  University  of  Illinois — have  been  devised  with  a  bank 
of  processors  capable  of  working  in  parallel.  (Our  own  1024-image  pre¬ 
processor  performs  a  specialized  type  of  parallel  operation  on  the  image 
field.)  It  should  be  noted  that  a  conventional  computer  such  as  the 
SDS910  is  capable  of  partial  parallel  processing  by  using  the  logical 
operations  that  deal  in  parallel  with  the  bits  of  a  computer  word,  re¬ 
presenting  a  row  of  the  image.  An  operation  on  a  24  X  24  element  field 
that  would  require  576  steps  sequentially,  or  one  step  in  a  parallel 
computer,  can  be  performed  in  24  steps  by  the  conventional  computer, 
affording  a  considerable  saving  in  time  relative  to  purely  sequential 
operat ion . 

The  CONNS  routine,  as  it  is  actually  programmed,  also  takes  advan¬ 
tage  of  the  fact  that  each  row  of  the  growing  FIGC  is  immediately  available 
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for  the  calculation  of  the  next  row.  This  allows  the  connected  region 
formed  in  FIGC  to  cascade  in  one  direction  (downward)  during  the  execution 

of  steps  2  and  3,  above.  If  the  original  FIGB  is  at  or  near  the  top  of 

the  appropriate  connected  component  of  FIGA,  FIGC  can  be  found  in  very 
short  order.  This  is  a  limited  example  of  the  sequential  processing  dis¬ 
cussed  in  the  paper  by  Rosenfelo  and  Pfaltz.4 

CONNS  and  CONN4  are  basic  building  blocks  for  other  operations 
described  below. 

Although  we  have  associated  CONN4  with  the  ground  (rather  than 
figure)  components  of  the  image,  C0NN4  is  programmed  to  work  on  regions 

composed  of  elements  with  the  value  1,  as  does  CONN8.  Since  the  figure 

is  normally  assigned  the  value  of  1  and  the  ground  the  value  of  0,  a 
figure-ground  complementation  is  necessary  if  CONN4  in  its  present  form 
is  to  be  used  on  ground  regions.  This  complementation  can  be  per¬ 
formed  by  the  subroutine 

CMPFIG  (INFIG,  OUTFIG). 

4 .  Figure  Dissection 

An  arbitrary  figure  can  be  dissected  into  its  4-  or  8-connected 
components  by  subroutine 

DISE48  (INFIG,  KOUNT ,  NFIGS,  MAXNT ,  MODE). 

DISE48  places  individual  connected  components  in  NFIGS,  which  is  an 
array  of  image  fields,  and  returns  the  component  count  in  KOUNT.  DISE48 
first  searches  the  input  image  to  find  an  element  with  value  1.  (This 
search  can  be  performed  by  subroutine  WNPT,  which  finds  the  northmost  of 
the  westmost  of  the  figure  points.)  Either  CONN4  or  CONNS  is  then 
called,  depending  on  MODE,  to  find  the  entire  connected  component,  including 
this  element.  This  component  is  removed  from  the  input  image  and  placed 
in  the  first  image  field  of  NFIGS.  The  process  is  repeated,  filling 
successive  fields  of  NFIGS,  until  the  input  image  is  exhausted  or  MAXKNT-1 
components  have  been  dissected. 
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5.  Subroutines  GROWS  and  GROW4 

Subroutine  GROWS  (INFIG,  OUTFIG)  expands  the  figure  XNFIG  by  one 
element  in  each  of  the  eight  major  directions.  GROWS  performs  an 
operation  equivalent  to  the  operation  applied  to  F1GC  in  the  second  step 
of  CONNS.  GROWS  is  useful  for  finding  parts  of  an  image  field  immediately 
adjacent  to  a  given  area.  GROW4  is  a  routine  analogous  to  GROW8,  but  in¬ 
volving  4-connectivity. 

A  routine  SHRINK,  which  strips  away  the  outer  layer  of  figure, 
could  be  devised.  SHRINK  is,  in  a  rough  sense,  the  inverse  of  GROW. 

The  two  operations  are  not  truly  inverse,  however,  nor  are  they  commuta¬ 
tive  with  each  other.  For  example,  the  sequence  (SHRINK,  GROW)  eliminates 
isolated  figure  points,  thus  changing  the  image. 

SHRINK  can  be  realized  by  applying  GROW  to  the  complement  of  the 
figure  to  be  shrunk,  obtained  with  CMPFIG.  Just  as  there  are  4-connected 
and  8-connected  versions  of  GROW  there  could  be  analogous  versions  of 
SHRINK. 

The  GROW  and  SHRINK  operations  are  quantized  analogs  of  the  "grass-fire” 
method  of  Dr.  Harry  Blum  of  Air  Force  Cambridge  Research  Labs. 

This  description  will  be  resumed  in  a  future  report,  with  a  descrip¬ 
tion  of  routines  for  finding  the  perimeter,  convex  hull,  concavities, 
enclosures,  and  strokes  of  a  connected  figure. 
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IV  INITIAL  DEVELOPMENT  OF  A  FORTRAN  SYNTAX  ANALYZER 


A .  I nt  roduc  t i on 

Some  brief  experiments,  described  in  the  Second  Quarterly  Report, 
indicated  that  humans  achieve  error  rates  in  the  range  from  one  to  five 
percent  when  presented  with  hand-printed  characters  in  random  order. 

When  presented  with  text  material — i.e.,  printed  matter  organized  into 
words,  sentences,  equations,  etc. --humans  achieve  error  rates  of  a  small 
fraction  of  one  percent.  Clearly,  the  human  makes  use  of  context  in 
recognizing  the  individual  characters,  and  it  is  obvious  that  a  success¬ 
ful  FORTRAN  text  reader  will  have  to  do  likewise.  Accordingly,  we  undei — 
took  the  development  of  a  FORTRAN  syntax  analyzer,  which  would  accept 
partially  mi s- i den ti f i ed  input  from  the  single-character  classifier  and 
produce  clean  text. 

The  word  "syntax"  refers  to  the  formal  grammar  of  the  FORTRAN 
language;  hence,  the  syntax  analyzer  would  make  use  of  the  fact  that 
every  statement  in  FORTRAN  must  obey  the  rules  of  FORTRAN  grammer.  One 
can  also  investigate  the  use  of  context.  The  word  "context,"  as  opposed 
to  "syntax,"  refers  to  the  fact  that  a  particular  word  or  character  must 
fit  in  with  the  words  and  characters  surrounding  it  in  order  for  the 
whole  tc  make  sense.  Thus,  one  could  construct  a  statement  in  FORTRAN 
(or  in  any  natural  or  computer  language,  for  that  matter)  which  obeyed 
the  rules  of  grammar  (syntax)  but  made  no  sense  because  some  of  the 
words  were  meaningless  in  their  particular  context.  It  is  anticipated 
that  the  text  analyzer  will  eventually  make  use  of  contextual  information 
as  well,  but  the  current  effort  emphasizes  the  grammatical  aspects  of 
the  FORTRAN  language. 


35 


B  Structure  of  the  Syntax  Annlyzer 


1 .  Input  to  the  Analyzer 

A  pattern  classifier  is  usually  thought  of  as  a  device  that  accepts 
a  pattern  and  decides  which  class  it  belongs  to.  In  order  to  make  the 
most  efficient  use  of  the  syntax  analyzer,  however,  the  classifier  will 
produce  not  a  single  decision,  but  a  list  of  alternative  decisions. 
Moreover,  each  alternative  will  be  accompanied  by  a  number  giving  the 
confidence  in  that  alternative.  For  example,  if  the  original  character 
(true  class  1  was  the  letter  "0",  the  classifier  might  produce  the  list: 

( ( D  40 )  (  O  30 )  ( Q  10)  (0  10)  (P  lO))  meant ng  that  the  classifier  decided 

that  the  character  was  a  "D"  with  confidence  40,  an  "O"  with  confidence 
30,  a  "q"  with  confidence  10,  the  numeral  "0"  with  confidence  10,  or  a 
"P"  with  confidence  10.  We  have  called  such  lists  L-lists. 

The  number  of  alternatives  for  any  given  character  to  be  recognized 
will  vary,  depending  upon  how  uncertain  the  classifier  was.  If  the 
classifier  used  is  the  9-view  piecewise  linear  machine  described  in 
previous  reports,  for  example,  then  each  view  might  contribute  10$  to 
the  total  confidence.  (Normalization  to  90$,  100$,  or  any  other  number 
is  immaterial,  since  the  annlyzer  deals  with  relative  confidence  levels.) 

A  single  FORTRAN  statement  would  be  represented  at  the  input  of  the  syntax 
analyzer  bv  a  list  of  I. -lists,  one  for  each  character  of  the  text,  where 
each  I.-list  has  the  form  of  the  example  given  above.  We  have  called 
such  lists  P-lists.  A  P-list  is  the  basic  input  to  the  syntax  analyzer. 

2.  Breakdown  by  Statement  Types 

The  FORTOAN  language  is  divided  into  approximately  35  different 
statement  types.  Some  of  the  more  common  types  are  the  DO  statement, 
arithmetic  assignment  statement,  GO  TO  statement,  and  IF  statement.  The 
analyzer  attempts  first  to  find  the  statement  type  that  the  P-list 
belongs  to,  and  then  calls  in  a  "specialist"  program  to  clean  up  that 
statement  type  and  produce  the  final  answer.  The  determination  of  the 
statement  type  is  based  on  the  fact  that  the  syntax  of  each  type,  with 
one  exception,  requires  that  the  statement  begin  with  a  special  control 
word  or  words.  In  the  examples  above,  DO,  IF,  and  GO  TO  are  the  control 
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words.  The  arithmetic  assignment  statement  is  the  single  exception. 

Thus,  the  analyzer  first  finds  the  average  confidence  of  a  match  with 
each  of  the  FORTRAN  control  words.  If  the  match  is  sufficiently  high 
with  a  given  control  word,  then  the  statement  type  corresponding  to  that 
control  word  is  assumed.  If  no  match  is  sufficiently  high,  then  the 
arithmetic  assignment  statement  is  assumed.  A  detailed  explanation 
and  theoretical  justification  for  this  procedure  is  given  in  the  Appendix. 

3.  Specialist  Programs 

One  the  statement  type  has  been  determined,  the  specialist  program 
for  that  type  must  be  called  in  to  produce  the  final  clean  FORTRAN 
statement.  We  are  currently  in  the  process  of  writing  these  programs, 
and  to  date  have  completed  eight.  Of  these  eight,  three  are  represen¬ 
tative  of  the  difficulty  we  expect  to  encounter  in  writing  the  remainder. 

It  is  difficult  to  describe  these  programs  in  detail  without  first 
specifying  the  syntax  of  each  statement  type.  Loosely  speaking,  however, 
the  specialist  programs  try  to  break  the  P-list  into  small  pieces  by 
attempting  to  fimi  delimiters  called  for  bv  the  syntax.  Thus,  for 
example,  if  the  syntax  of  a  given  statement  type  calls  for  a  comma  at 
a  certain  place,  the  program  will  look  for  the  existence  of  a  possible 
comma,  and  see  if  the  pieces  on  each  side  can  be  made  into  the  appropriate 
segments  of  the  statement.  If  they  can  be,  they  are;  otherwise,  we 
continue  searching  for  a  possible  comma.  This  breakup  process  can  be 
carried  out  only  to  a  certain  degree  of  fineness;  beyond  that  point, 
one  must  examine  a  segment  of  the  P-list  as  an  entity,  and  try  to  make 
sense  of  it.  Examples  of  these  "entities"  are  variable  names,  numbers 
(not  necessarily  single  digits),  and  arithmetic  expressions. 

At  this  level  in  the  program  we  again  appeal  to  the  confidence 
attached  to  each  alternative.  For  any  segment  of  a  P-list,  we  can 
find  the  string  of  characters  it  most  confidently  represents  by  simply 
choosing  the  most  confident  alternative  for  each  character.  Similarly, 
one  could  find  the  second  most  confident  string,  third  most  confident 
string,  etc.  Thus,  if  we  arrive  at  a  point  where  a  segment  of  the  P-list 
must  be  examined  as  an  entity,  we  consider  the  most  likely  string  of 
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characters,  the  second  most  likely,  etc.,  until  either  n  string  is 
found  which  agrees  with  the  FORTRAN  syntax  or  we  are  forced  to  stop 
becnuse  the  combinatorial  growth  in  the  number  of  possible  strings 
exceeds  our  computing  power.  This  process  is  essentially  the  same  ns 
the  method  in  determining  statement  type,  and  the  analysis  presented 
in  the  Appendix  applies  here  ns  well.  The  problem  of  finding  the  1st, 

2nd,  3rd,  .  .  .  most  confident  string  of  characters  is  by  no  means  a 
trivial  rroblem.  A  solution  to  this  problem  involving  a  modification 
of  the  technique  known  as  dynamic  programming  was  proposed  by  R.  E.  Larson 
of  SRI  and  Is  described  in  the  next  section.  A  program  implementing  this 
solution  is  currently  being  written. 

4 .  Dynamic  Programming 

Consider  the  following  P-list  that  might  have  been  produced  by  the 
classifier  working  on  the  list  of  integers  19,8: 


(  (  (/  60) 

(  (.50) 
(  (.50) 
(  (B  40) 


(1  30)  ) 

(9  40)  ) 

(7  30)  (9  10)  ) 

(8  30'  (3  20)  )  ). 


This  P-list  indicates  that  the  first  character  was  classified  as  a  slash 
with  confidence  60  and  as  a  one  with  confidence  30,  etc.  Bv  taking  the 
first  choice  for  each  of  the  four  characters,  we  obtain  the  string  /,  ,E 
having  the  maximum  confidence,  200.  A  brief  examination  shows  that 
there  ore  two  strings  having  confidence  190--namely  /,,8  and  /9,B--but 
even  with  this  simple  problem  it  soon  becomes  difficult  to  find  all 
strings  of  confidence  180,  170,  160,  etc. 

The  dynamic  programming  solution  to  this  problem  uses  only  the  matrix 
of  confidences: 


60 

30 

- 

50 

40 

- 

50 

30 

10 

40 

30 

20. 
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In  general ,  the  ij*^1  element  of  this  matrix  Is  the  confidence  associated 
with  choosing  the  .j4*1  alternative  for  the  i  member  of  the  string.  A 
selection  of  a  particular  string  corresponds  to  a  function  .j(i),  and  is 
called  a  policy,  and  for  each  policy  there  is  a  total  confidence.  Our 
problem  is  to  rank  the  policies  so  that  if  m  <  n  then  the  total  confidence 
for  the  n1*1  policy  is  less  than  or  equal  to  that  for  m1"*1  policy. 

This  ranking:  is  accomplished  in  two  steps.  First  the  possible  choices 
for  each  row  of  C--i.e.,  for  each  stage  of  the  decision  process— are 
considered  and  the  possible  partial  confidences  are  systematically  re¬ 
corded.  This  is  done  for  each  stage  in  succession  until  all  of  the 
possible  total  confidences  have  been  obtained.  Second,  the  total  confi¬ 
dences  are  considered  in  succession  and  all  of  the  possible  policies 
yielding  those  total  confidences  are  obtained. 

The  details  of  this  procedure  are  best  described  by  using  our  simple 
example.  Consider  the  possibilities  of  Stage  1.  The  first  decision 
yields  a  partial  confidence  of  60  and  the  second  yields  a  partial  confi¬ 
dence  of  30.  These  values  are  recorded  as  the  two  lower- lef t-most  nodes 
of  the  grapn  in  Fig.  13.  Here  the  numbers  by  the  two  lower-most  branches 
indicate  which  decision  was  made  at  Stage  1,  and  the  numbers  inside  the 
circled  nodes  indicate  the  number  of  policies  that  yield  the  corresponding 
partial  confidences. 

Now  consider  Stage  2.  Had  we  reached  the  partial  confidence  of  60 
from  Stage  1,  at  Stage  2  the  first  decision  would  yield  a  partial  confi¬ 
dence  of  60  50  =  110  and  the  second  would  yield  60  +  40  =  lOOj  on  the 

other  hand,  had  we  only  reached  30,  the  results  of  the  Stage  2  decision 
would  be  either  80  or  70.  All  four  of  these  partial  confidences  obtainable 
at  Stage  2  are  shown  on  the  graph. 

So  iar  the  basic  advantage  of  this  approach  has  not  become  apparent, 
since  all  possible  combinations  of  decisions  and  partial  confidences  are 
exhaustively  represented.  Were  there  D  possible  decisions  at  each  stage, 
one  might  fear  that  an  N-stage  process  would  have  to  show  all  possible 
results  explicitly.  The  discrete  nature  of  the  process  prevents  this 
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CONFIDENCE 


from  happening,  however.  In  particular,  note  that  at  Stage  3  there  are 
two  ways  of  obtaining  a  partial  confidence  of  130,  either  by  using  the 
policy  .1(1)  =  1,  ,1(2)=2,  ,1(  3 )  =2 ,  or  the  policy  ,j(l)=2,  j(2)=l,  ,j(3)  =  l. 
Systematic  consideration  of  the  partial  confidences  that  con  be  obtained 
at  Stage  3  finally  leads  to  the  complete  set  of  total  confidences  that 
can  be  obtained  at  Stage  4  shown  on  the  graph.  Note  that  there  are  only 
eleven  distinct  total  confidences,  which  is  considerably  less  than  the 
2X2X3X3=36  possibilities  that  could  be  obtained  if  all  of  the 
possible  combinations  of  choices  gave  different  total  confidences.  In 
general,  when  the  individual  confidences  are  integers,  there  can  be  no 
more  total  confidences  than  the  value  oi  the  maximum  total  confidence, 
and  this  usually  represents  a  considerable  reduction  in  computation. 

To  find  all  of  the  policies  yielding  the  highest  total  confidence, 
one  merely  traverses  the  graph  from  the  corresponding  terminal  node 
back  to  the  origin.  In  this  case,  there  is  only  one  optimal  policy: 
j(4)=l,  j(3)=l,  J(2)=l,  j(l)=l.  Two  policies  yield  the  next  highest  total 
confidence:  j(4)=2,  j(3)=l,  j(2)=l,  j(l)=l  and  j(4)=l,  j(3)=l,  j(2)=2, 

and  j(l)=l.  Continuing  in  this  way,  one  obtains  the  policies  in  order  of 
descending  total  confidence,  and  results  down  to  confidence  150  are 
listed  in  Table  II. 

This  example  illustrates  two  important  considerations.  First,  this 
systematic  procedure  will  indeed  yield  candidate  strings  that  can  be 
tested  for  syntactic  legality  in  order  of  confidence.  Thus,  for  example, 
the  19th  policy  yields  the  string  19,8  which  is  the  syntactically  valid 
integer  list  having  highest  confidence.  There  are  several  other  legal 
integer  lists,  such  as  1,78  and  1978,  but  they  all  have  lower  total  con¬ 
fidence  and  need  not  be  considered. 

Secondly,  however,  it  is  clear  that  this  procedure,  as  it  stands, 
will  generate  many  strings  that  have  no  hope  of  being  legal  integer  lists. 
Its  efficiency  could  be  improved  markedly  by  doing  such  things  as  deleting 
from  the  P-list  any  symbols  other  than  digits  or  commas,  retaining  in 
each  L-list  only  the  digit  having  highest  confidence,  etc.,  or  perhaps 
even  incorporating  a  dictionary  of  possible  statement  labels  obtained 
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Table  II 


FIRST  TWENTY-ONE  POLICIES 


Total  Confidence 

Number 

Pol icy 

Corresponding 

String 

j(i) 

J(2) 

J(3) 

HIS;- 

200 

D 

1 

1 

i 

/ 

. 

. 

B 

190 

1 

1 

2 

/ 

i 

8 

H 

2 

1 

1 

/ 

9 

■ 

B 

180 

1 

1 

3 

/ 

3 

i 

2 

1 

2 

/ 

9 

, 

8 

1 

1 

2 

/ 

• 

7 

B 

170 

1 

i 

2 

1 

/ 

9 

3 

8 

i 

1 

') 

/ 

, 

7 

8  ' 

9 

i 

2 

2 

/ 

9 

7 

B 

10 

2 

1 

i 

• 

• 

B 

160 

11 

wm 

2 

3 

/ 

7 

3 

12 

2 

2 

/ 

9 

7 

8 

13 

2 

1 

2 

i 

, 

1 

8 

14 

3 

1 

/ 

9 

B 

15 

1 

1 

i 

9 

» 

B 

150 

1 

2 

3 

/ 

9 

7 

3 

1 

3 

i 

i 

3 

3 

2 

/ 

9 

8 

2 

2 

1 

2 

i 

9 

, 

8 

2 

3 

mm 

/ 

9 

9 

B 

1 

2 

n 

1 

> 

7 

B 

Iron  other  parts  of  the  program,  and  only  considering  combinations  of 
digits  that  correspond  to  possible  labels.  All  of  these  modifications 
are  specific  to  integer  lists,  however,  and  it  would  be  a  digression  to 
discuss  them  further.  The  major  conclusion  is  that  the  dynamic  pro¬ 
gramming  technique,  together  with  constraints  appropriate  to  the  partic¬ 
ular  problem,  offers  a  systematic,  general  method  of  obtaining  the  legal 
string  of  characters  having  highest  total  confidence. 


42 


C ,  Experimental  Results 


Although  the  syntax  analyzer  is  far  from  complete  and  the  final  form 
of  the  classifier  has  not  yet  been  determined,  we  decided  to  do  a  small 
experiment  to  see  if  the  basic  approach  of  the  analyzer  were  sound.  Five 
handprinted  GO  TO  statements  were  written  and  scanned  using  the  TV  camera. 
The  patterns  were  preprocessed  using  the  simulation  of  the  1024  image 
preprocessor,  and  classified  using  the  nine-view  piecewi se- li near  learning 
machine.  Each  vote  for  a  class  accrued  a  confidence  of  10  for  that  class. 
Since  the  analyzer  makes  use  of  spaces  in  the  text  and  the  current  scan 
program  does  not  output  spacing  information,  this  information  was  inserted 
manually.  This  is  a  small  point,  since  we  believe  that  a  simple  modi¬ 
fication  of  the  scan  program  will  enable  us  to  obtain  space  information 
with  essentially  100$  reliability. 

The  classifier  was  considered  to  be  incorrect  whenever  its  first-- 
i.e.,  most  confident — choice  was  in  error.  On  this  basis,  the  classifier 
made  15  errors  out  of  a  total  of  34  characters  for  an  error  rate  of  44$, 
which  is  considerably  higher  than  is  usual  for  this  classifier.  At  the 
output  of  the  syntax  analyzer,  however,  the  error  rate  dropped  to  3$. 

We  reproduce  the  experimental  results  for  the  first  statement  In 
their  entirety: 

Original  statement 
GO  TO  1150 

P-list  returned  by  classifier 

(((  F  40 ) ( E  10)  (G  10)  (9  10)  (  -  10)  (5  10)) 

((  0  40 ) (C  20)  (2  10)  (0  10)  (  ,  10)) 

((  SP  100)) 

((  T  60 ) (  -  30)) 

((  3  30 ) (  0  30)  (  C  20 ) ( P  lO)) 

((  SP  100)) 

((  1  60  )(  /  30)) 

((  1  70  )(  /  20)) 

((  5  90)) 

((  P  50) (  /  20)  (  0  10))) 


! 

i 


43 


«■-**** 


First  thoi cos  of  classifier 


FO  T:t  115P 

Output  of  syntax  nniilv/cr 
GO  TO  11 5P 

Tlie  experimental  results  lor  the  remaining  lour  statements  are  summarized 
lie  1  ow : 


Statement  2 

GO 

TO 

2<S 

Classifier's  1st 

choi ces 

r>c 

T  : 

2<i 

Analyzer  output 

GO 

TO 

2<> 

Statement  3 

GO 

TO 

9HH 

Classifier's  1st 

choices 

-() 

TC 

7  H 

Analyzer  output 

GO 

TO 

7  HH 

Statement  1 

GO 

TO 

59 

Classifier's  1st 

choices 

KO 

FC 

59 

Analyzer  output 

GO 

TO 

59 

Statement  5 

GO 

TO 

123 

C 1  assl f 1 er ' s  1st 

choices 

GO 

TC 

/23 

Analyzer  output 

GO 

TO 

123 

D.  Conclusion 

The  experiment  .iust  described  illustrates  dramatically  the  power  of 
syntax  analysis  for  cleaning  up  text  in  which  individual  characters  are 
mi sclassi fied .  It  is  to  be  recognized,  of  course,  that  only  a  portion  of 
the  syntax  analysis  program  has  been  developed  to  date,  and  hat  corre¬ 
spondingly  only  a  limited  specimen  of  text  representing  one  of  the  simplest 
FORTRAN  statement  types  was  used  in  the  experiment.  We  are  currently 
expanding:  the  syntax  analyzer,  and  this  activity  will  continue  with  the 
objective  of  encompassing  the  several  FORTRAN  statement  types .  At  a  later 
date  we  plan  to  bepin  incorporating  contextual  analysis  beyond  the  purely 
s  yn'.  actical. 
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This  small  experiment  also  represents  a  milestone,  in  that  for  the 
iirst  time  we  have  carried  actual  text  from  a  coding  sheet  through 
scanning,  preprocessing,  classifications,  and  syntax  analysis.  "Hius 
we  have,  in  an  embryonic  sense,  demonstrated  the  complete  chain  of 
anal ysi s . 
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Appendi x 

A  DECISION-THEORETIC  FRAMEWORK  FOR  THE  SYNTAX  ANALYZER 
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Append! x 


A  DEC  I  SION- THEORETIC  FRAMEWORK  FOR  THE  SYNTAX  ANALYZER 


In  Sec.  IV  of  this  report  we  described  the  ways  in  which  we  ore  using 
the  syntax  of  FORTRAN  to  reduce  the  error  rate  in  recognizing  handprinted 
text.  Abend6’7  has  recently  pointed  out  that  compound  decision  theory 
provides  a  natural  mathematical  framework  for  incorporating  ccnlext  in 
pattern  recognition.  Unfortunately,  a  rigorous  implementation  of  com¬ 
pound  decision  theory  requires  the  estimation  of  too  many  high-order 
joint  probabilities  to  warrant  considering  this  approach  seriously. 
Nevertheless,  viewing  the  problem  from  the  vantage  point  of  statistical 
decision  theory  serves  to  clarify  the  problem  and  partially  to  justify 
our  more  pragmatic  approach. 

As  a  prelude  to  considering  the  compound  decision  problem,  consider 
the  problem  of  classifying  a  single  pattern  represented  by  a  set  of 
measurements.  These  measurements,  which  might  indicate  such  things  as 
the  presence  or  absence  of  edges,  corners,  etc.,  make  up  the  components 
of  a  pattern  vector  x.  For  any  given  "pattern"  x,  we  must  pick  a 
category  9,  where  9  designates  one  of  the  possible  categories.  For  the 
case  where  x  is  an  allowable  FORTRAN  character,  9  can  assume  any  of  46 
ciifferent  values,  corresponding  to  the  26  letters,  10  numerals,  and  10 
special  characters. 

It  is  well  known8  that  in  order  to  obtain  a  minimum-error-rate 
classifier,  one  should  compute  p(0|x)  for  every  possible  value  of  9, 
and  choose  that  value  of  9  for  which  p(9|x)  is  maximum.  By  Bayes'  rule, 


p(el x) 


p( x I  9 )  pf  9) 

P(x) 


(A-l) 


where  p(9)  is  the  j*  priori  probability  for  9.  Thus  the  optimum  procedure 
reflects  the  fact  that  some  characters  appear  more  frequently  than  others 
through  the  presence  of  p(0)  in  the  numerator  of  Eq .  (A-l). 
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In  most  ol  our  work  to  dnte,  we  huve  t ruined  cl nssi  tiers  such  ns 
pi  occwisc- 1  i  nen  r  machines  to  pi  vo  ns  low  nn  error  rate  ns  we  could  obtain. 
We  have  tacitly  assumed  the  a  pri ori  probabilities  p(  9 ^  to  be  equal, 
inasmuch  ns  we  have  represented  each  character  type  with  equal  frequency 
in  both  the  training  and  the  testing  data.  Thus,  to  the  extent  that  the 
performance  of  these  classifiers  is  optimum,  we  can  soy  that  for  any 
pattern  x  we  compute  the  46  functions 


p*( 6  j  X )  =  P(xl9)  46  fA-2) 

pO) 

and  assign  x  to  the  category  for  which  p*(6|x  Is  maximum. 

Consider  now  the  compound  decision  problem  resulting  from  scanning 

a  syntactically  valid  FORTRAN  statement  containing,  say,  m  characters. 

The  result  is  to  obtain  a  set  of  m  pattern  vectors  x,  ,  x„ . x  ,  which 

1  £  m 

can  be  thought  of  as  the  components  of  another  vector,  x.  Our  problem 

is  to  select  a  corresponding  set  of  categories  9,,  9„ ,  ....  9  --i.e., 

i  m 

a  vector  9  such  that  pf9|x)  is  maximum.  Were  the  9's  statistically  in¬ 
dependent,  we  would  merely  select  the  best  9^  for  each  x^ .  However,  the 
very  reason  for  considering  this  problem  is  that  the  syntax  constraints 
prevent  the  9's  from  being  independent  and  allow  us  to  obtain  a  better 
decision  by  considering  the  compound  problem  than  can  be  obtained  from 
considering  each  of  the  component  problems  separately. 

There  is  one  kind  of  independence  assumption  we  can  invoke,  however, 
to  obtain  a  significant  simplification  of  the  results.  We  assume  that  for 
any  i  the  i*h  set  of  measurements,  x^  ,  depends  solely  on  6^,  the  category 
of  the  i  ***  charactei - i.e.,  that 


P(x, 


'l' 


‘i-l’  Aifl’ 


=  P(x1le1'>-  ( A-3 ) 


This  is  actually  a  very  reasonable  assumption:  it  merely  states  that 
the  types  of  variations  seen  in  a  handprinted  character — say,  the  letter  A-- 
depend  only  upon  the  fact  that  it  is  nn  A,  and  not  at  all  upon  the  fact 
that  it  is  surrounded  bv  other  characters.  The  fact  that  certain  strings 
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of  characters  are  illegal,  or  rarely  occur,  will  be  Introduced  through 
the  priori  probability  p(6). 


As  was  stated  previously,  to  obtain  a  minimum-error-rate  classifier, 
we  must  compute  p(9|x)  for  all  possible  vectors  6  and  select  the  9  for 
which  p(9|x)  Is  maximum.  By  Bayes'  rule, 

p(9|x)  =  fMll  £112  .  (A-4 ) 

P(x) 

It  is  easy  to  show  by  induction  that  our  conditional  Independence  assump¬ 
tion,  given  by  Eq .  (A-3),  leads  to 

p(x|9)  =*  n  p(x  IS  '  .  (A-5) 

1  =  1  1 

Substituting  this  and  Eq .  (A-2)  into  Eq .  (A-4)  yields 

p(9|x)  =  n  [46p(x  )p*(9jx  )J  .  (6) 

p(x)  i 

This  can  be  simplified  further,  however,  since  we  are  only  interested 
in  the  variation  of  p(9|j<)  with  _9 .  Thus,  if  we  drop  constants  and  factors 
dependent  solely  on  x,  we  obtain  an  equivalent  compound  decision  rule  by 
selecting  that  9  for  which 


q(9,  x) 


P(«) 


m 

n 

i  =  1 


P*(e1|xi) 


(7) 


is  maximum. 

Consider  now  the  application  of  this  result  to  a  segment  of  a  FORTRAN 
statement  that  we  wish  to  treat  as  an  entity,  such  as  a  number  or  a  list 
of  integers  separated  by  commas .  Any  given  9  corresponds  to  a  string  of 
m  characters  and  either  is  or  is  not  a  syntically  valid  entity.  We 
assume  that  we  are  always  dealing  with  valid  FORTTIAN  statements,  and 
hence  that  p(9)  =  0  for  invalid  entities.  Thus  we  need  only  consider  _6's 
corresponding  to  valid  strings. 
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Who  t  con  we  assume  about  pf8)  for  valid  strings?  In  some  circum¬ 
stances,  it  seems  reasonable  to  invoke  the  principle  of  insufficient 
reason  and  consider  all  valid  strings  to  lx?  equally  likely.  For  example, 
if  the  entity  is  supposed  to  be  n  number,  there  is  little  reason  to  expect 
one  number  more  than  another.  In  such  coses  p(8)  is  n  constant  factor 
that  does  not  influence  the  decision;  one  merely  selects  the  9  for  the 
valid  string  for  which 


m 

n  P*fei|x.) 

is  maximum.  I  f  we  define  the  confidence  cfQ^,  x  )  as  the  logarithm  of 
p  (  I  ^  *  this  is  equivalent  to  selecting  the  9  for  the  valid  string  for 
which  the  total  confidence 

m 

£  cfejx^ 

Is  maximum.  Thus  the  dynamic  procramml ng  approach  described  in  Sec.  IV 
is  applicable  in  this  situation.  Strings  are  considered  in  order  of 
decreasing  confidence  until  a  valid  string  is  encountered,  this  valid 
string  corresponding  to  the  optimal  statistical  decision. 

There  are  other  common  situations,  however,  in  which  it  is  not  at 
all  reasonable  to  consider  all  valid  strings  to  be  equally  likely.  When 
dealing  with  an  entire  FORTRAN  statement,  for  example,  we  are  much  more 
likely  to  find  the  statement  starting  with  a  string  like  DIMENSION  or 
GO  TO  913  than  XM/.QR  VUS .  Thus ,  rather  than  generating  strings  of 
successively  lower  confidence  and  rejecting  them  until  we  obtain  a  valid 
string,  we  can  immediately  compute  the  confidence  associated  with  those 
strings  for  which  p(G)  is  known  to  be  large  a  priori --namely,  the  control 
words  associated  with  the  various  statement  types.  If  the  string  does 
indeed  happen  to  begin  with  XMZQR  =  VUS,  then  none  of  these  strings  is 
likely  to  have  a  high  conf i dence-- tha t  is,  a  confidence  that  is  reasonably 
close  to  the  maximum  possible  confidence.  On  the  other  hand,  if  the 
true  string  corresponds  to  any  statement  type  other  than  an  arithmetic 
assignment  statement,  we  are  very  likely  to  obtain  a  high-confidence 
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match  with  the  corresponding  control  word(s)  quickly,  since  we  have  to 
consider  only  one  8  for  each  stntement  type.  Since  it  is  impossible, 
in  practice,  to  estimate  the  a  priori  probabilities  for  all  syntactically 
valid  strings,  so  that  some  approximate  procedure  must  be  used  in  any 
case,  this  appears  to  be  a  very  reasonable  procedure  that  does  not  do 
violence  to  the  guiding  ideas  of  compound  decision  theory. 
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This  report  describes  the  continuing  development  of  preprocessing,  classification,  and  context 
analysis  techniques  for  hand-printed  text,  which  are  advancing  at  an  accelerating  pace. 

Experiments  have  been  continued  with  the  Piecewise-Linear  learning  machine,  using  the  outputs 
of  two  preprocessors:  The  ITIKP  24A  simulation  of  the  1024-image  optical  preprocessor,  ana 
the  CAlAMASk  preprocessor,  which  employs  both  edge-detecting  and  corner-detecting  masks.  A 
new  low  test  error  rate  for  classification  has  been  achieved  on  hand-printed  alphabets  of 
FOIfTRAN  characters. 

Statistics  of  the  performance  of  the  learning  machine  during  a  single  testing  iteration  are 
presented,  and  shed  light  on  several  important  questions,  such  as  the  distribution  of  rank¬ 
ings  of  the  desired  character  category  when  it  is  not  in  first  place. 

A  discussion  of  the  preprocessing  methods  used  in  the  topological  approach  to  preprocessing 
and  classification  is  begun. 

The  initial  development  of  a  FUHTRAN  syntax  analyzer  is  described.  A  milestone  has  been 
reached  with  the  passage  of  a  small  sample  of  actual  FORTRAN  text  from  a  coding  sheet  through 
the  scanning,  preprocessing,  classification,  and  synt ax- ana lys i s  programs. 
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